{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# 多层感知机的从零开始实现\n",
    ":label:`sec_mlp_scratch`\n",
    "\n",
    "我们已经在数学上描述了多层感知机（MLP），现在让我们尝试自己实现一个多层感知机。为了与我们之前使用softmax回归（ :numref:`sec_softmax_scratch` ）获得的结果进行比较，我们将继续使用Fashion-MNIST图像分类数据集（ :numref:`sec_fashion_mnist`）。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load ../utils/djl-imports\n",
    "%load ../utils/plot-utils\n",
    "%load ../utils/DataPoints.java\n",
    "%load ../utils/Training.java\n",
    "%load ../utils/Accumulator.java"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "import ai.djl.basicdataset.cv.classification.*;\n",
    "import org.apache.commons.lang3.ArrayUtils;"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "int batchSize = 256;\n",
    "\n",
    "FashionMnist trainIter = FashionMnist.builder()\n",
    "        .optUsage(Dataset.Usage.TRAIN)\n",
    "        .setSampling(batchSize, true)\n",
    "        .optLimit(Long.getLong(\"DATASET_LIMIT\", Long.MAX_VALUE))\n",
    "        .build();\n",
    "\n",
    "\n",
    "FashionMnist testIter = FashionMnist.builder()\n",
    "        .optUsage(Dataset.Usage.TEST)\n",
    "        .setSampling(batchSize, true)\n",
    "        .optLimit(Long.getLong(\"DATASET_LIMIT\", Long.MAX_VALUE))\n",
    "        .build();\n",
    "                            \n",
    "trainIter.prepare();\n",
    "testIter.prepare();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 初始化模型参数\n",
    "\n",
    "回想一下，Fashion-MNIST中的每个图像由$28 \\times 28 = 784$个灰度像素值组成。所有图像共分为10个类别。忽略像素之间的空间结构，我们可以将每个图像视为具有784个输入特征和10个类的简单分类数据集。首先，我们将实现一个具有单隐藏层的多层感知机，它包含256个隐藏单元。注意，我们可以将这两个量都视为超参数。通常，我们选择2的若干次幂作为层的宽度。因为内存在硬件中的分配和寻址方式，这么做往往可以在计算上更高效。\n",
    "\n",
    "我们用几个`NDArray`来表示我们的参数。注意，对于每一层我们都要记录一个权重矩阵和一个偏置向量。跟以前一样，我们要为这些参数的损失的梯度分配内存。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "int numInputs = 784;\n",
    "int numOutputs = 10;\n",
    "int numHiddens = 256;\n",
    "\n",
    "NDManager manager = NDManager.newBaseManager();\n",
    "\n",
    "NDArray W1 = manager.randomNormal(\n",
    "                        0, 0.01f, new Shape(numInputs, numHiddens), DataType.FLOAT32, Device.defaultDevice());\n",
    "NDArray b1 = manager.zeros(new Shape(numHiddens));\n",
    "NDArray W2 = manager.randomNormal(\n",
    "                        0, 0.01f, new Shape(numHiddens, numOutputs), DataType.FLOAT32, Device.defaultDevice());\n",
    "NDArray b2 = manager.zeros(new Shape(numOutputs));\n",
    "\n",
    "NDList params = new NDList(W1, b1, W2, b2);\n",
    "\n",
    "for (NDArray param : params) {\n",
    "    param.setRequiresGradient(true);\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 激活函数\n",
    "\n",
    "为了确保我们知道一切是如何工作的，我们将使用最大值函数自己实现ReLU激活函数，而不是直接调用内置的`relu`函数。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "public NDArray relu(NDArray X){\n",
    "    return X.maximum(0f);\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 模型\n",
    "\n",
    "因为我们忽略了空间结构，所以我们使用`reshape`将每个二维图像转换为一个长度为`numInputs`的向量。我们只需几行代码就可以实现我们的模型。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "public NDArray net(NDArray X) {\n",
    "    X = X.reshape(new Shape(-1, numInputs));\n",
    "    NDArray H = relu(X.dot(W1).add(b1));\n",
    "    return H.dot(W2).add(b2);\n",
    "}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 损失函数\n",
    "\n",
    "为了确保数值稳定性，同时由于我们已经从零实现过softmax函数（ :numref:`sec_softmax_scratch` ），因此在这里我们直接使用高级API中的内置函数来计算softmax和交叉熵损失。回想一下我们之前在 :numref:`subsec_softmax-implementation-revisited` 中对这些复杂问题的讨论。我们鼓励感兴趣的读者查看`Loss.SoftmaxCrossEntropyLoss`的源代码，以加深对实现细节的了解。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "Loss loss = Loss.softmaxCrossEntropyLoss();"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 训练\n",
    "\n",
    "幸运的是，多层感知机的训练过程与softmax回归的训练过程完全相同。可以使用和第三章类似的代码来训练模型（参见 :numref:`sec_softmax_scratch` ），将迭代周期数设置为10，并将学习率设置为0.1.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "int numEpochs = Integer.getInteger(\"MAX_EPOCH\", 10);\n",
    "float lr = 0.5f;\n",
    "\n",
    "double[] trainLoss = new double[numEpochs];\n",
    "double[] trainAccuracy = new double[numEpochs];\n",
    "double[] testAccuracy = new double[numEpochs];\n",
    "double[] epochCount = new double[numEpochs];"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了对学习到的模型进行评估，我们将在一些测试数据上应用这个模型。\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "float epochLoss = 0f;\n",
    "float accuracyVal = 0f;\n",
    "\n",
    "for (int epoch = 1; epoch <= numEpochs; epoch++) {\n",
    "    \n",
    "        System.out.print(\"Running epoch \" + epoch + \"...... \");\n",
    "        // Iterate over dataset\n",
    "        for (Batch batch : trainIter.getData(manager)) {\n",
    "\n",
    "            NDArray X = batch.getData().head();\n",
    "            NDArray y = batch.getLabels().head();\n",
    "\n",
    "            try(GradientCollector gc = Engine.getInstance().newGradientCollector()) {\n",
    "                NDArray yHat = net(X); // net function call\n",
    "\n",
    "                NDArray lossValue = loss.evaluate(new NDList(y), new NDList(yHat));\n",
    "                NDArray l = lossValue.mul(batchSize);\n",
    "                \n",
    "                accuracyVal += Training.accuracy(yHat, y);\n",
    "                epochLoss += l.sum().getFloat();\n",
    "                \n",
    "                gc.backward(l); // gradient calculation\n",
    "            }\n",
    "            \n",
    "            batch.close();\n",
    "            Training.sgd(params, lr, batchSize); // updater\n",
    "        }\n",
    "    \n",
    "        trainLoss[epoch-1] = epochLoss/trainIter.size();\n",
    "        trainAccuracy[epoch-1] = accuracyVal/trainIter.size();\n",
    "\n",
    "        epochLoss = 0f;\n",
    "        accuracyVal = 0f;    \n",
    "        // testing now\n",
    "        for (Batch batch : testIter.getData(manager)) {\n",
    "\n",
    "            NDArray X = batch.getData().head();\n",
    "            NDArray y = batch.getLabels().head();\n",
    "\n",
    "            NDArray yHat = net(X); // net function call\n",
    "            accuracyVal += Training.accuracy(yHat, y);\n",
    "        }\n",
    "    \n",
    "        testAccuracy[epoch-1] = accuracyVal/testIter.size();\n",
    "        epochCount[epoch-1] = epoch;\n",
    "        accuracyVal = 0f;\n",
    "        System.out.println(\"Finished epoch \" + epoch);\n",
    "}\n",
    "\n",
    "System.out.println(\"Finished training!\");"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "String[] lossLabel = new String[trainLoss.length + testAccuracy.length + trainAccuracy.length];\n",
    "\n",
    "Arrays.fill(lossLabel, 0, trainLoss.length, \"train loss\");\n",
    "Arrays.fill(lossLabel, trainAccuracy.length, trainLoss.length + trainAccuracy.length, \"train acc\");\n",
    "Arrays.fill(lossLabel, trainLoss.length + trainAccuracy.length,\n",
    "                trainLoss.length + testAccuracy.length + trainAccuracy.length, \"test acc\");\n",
    "\n",
    "Table data = Table.create(\"Data\").addColumns(\n",
    "    DoubleColumn.create(\"epochCount\", ArrayUtils.addAll(epochCount, ArrayUtils.addAll(epochCount, epochCount))),\n",
    "    DoubleColumn.create(\"loss\", ArrayUtils.addAll(trainLoss, ArrayUtils.addAll(trainAccuracy, testAccuracy))),\n",
    "    StringColumn.create(\"lossLabel\", lossLabel)\n",
    ");\n",
    "\n",
    "render(LinePlot.create(\"\", data, \"epochCount\", \"loss\", \"lossLabel\"),\"text/html\");"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## 小结\n",
    "\n",
    "* 我们看到即使手动实现一个简单的多层感知机也是很容易的。\n",
    "* 然而，如果有大量的层，从零开始实现多层感知机会变得很麻烦（例如，要命名和记录模型的参数）。\n",
    "\n",
    "## 练习\n",
    "\n",
    "1. 在所有其他参数保持不变的情况下，更改超参数`numHiddens`的值，并查看此超参数的变化对结果有何影响。确定此超参数的最佳值。\n",
    "1. 尝试添加更多的隐藏层，并查看它对结果有何影响。\n",
    "1. 改变学习速率会如何影响结果？保持模型结构和其他超参数(包括迭代周期数)不变，学习率设置为多少会带来最好的结果？\n",
    "1. 通过对所有超参数(学习率、迭代周期数、隐藏层数、每层的隐藏单元数)进行联合优化，可以得到的最佳结果是什么？\n",
    "1. 描述为什么涉及多个超参数更具挑战性。\n",
    "1. 如果要构建多个超参数的搜索方法，你能想到的最聪明的策略是什么？\n"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Java",
   "language": "java",
   "name": "java"
  },
  "language_info": {
   "codemirror_mode": "java",
   "file_extension": ".jshell",
   "mimetype": "text/x-java-source",
   "name": "Java",
   "pygments_lexer": "java",
   "version": "14.0.2+12"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
